1 

PaFiMoCS: Particle Filtered Modified-CS and 
Applications in Visual Tracking across 
Illumination Change 

Rituparna Sarkar, Samarjit Das and Namrata Vaswani 

Abstract 

We study the problem of tracking (causally estimating) a time sequence of sparse spatial signals with 
changing sparsity patterns, as well as other unknown states, from a sequence of nonlinear observations 
corrupted by (possibly) non-Gaussian noise. In many applications, particularly those in visual tracking, 
the unknown state can be split into a small dimensional part, e.g. global motion, and a spatial signal, 
e.g. illumination or shape deformation. The spatial signal is often well modeled as being sparse in 
some domain. For a long sequence, its sparsity pattern can change over time, although the changes are 
usually slow. To address the above problem, we propose a novel solution approach called Particle Filtered 
Modified-CS (PaFiMoCS). The key idea of PaFiMoCS is to importance sample for the small dimensional 
state vector, while replacing importance sampling by slow sparsity pattern change constrained posterior 
mode tracking for recovering the sparse spatial signal. We show that the problem of tracking moving 
objects across spatially varying illumination change is an example of the above problem and explain how 
to design PaFiMoCS for it. Experiments on both simulated data as well as on real videos with significant 
illumination changes demonstrate the superiority of the proposed algorithm as compared with existing 
particle filter based tracking algorithms. 

I. Introduction 

We request the reviewers to review the single column version (draft version) since the equations are 
formatted for that one. 

In this w^ork, w^e study the problem of tracking (causally estimating) a time sequence of sparse 
spatial signals w^ith slow^ly changing sparsity patterns, as w^ell as other unknow^n states, from a sequence 
of nonlinear observations corrupted by (possibly) non-Gaussian noise. In many practical applications, 
particularly those in visual tracking, the unknow^n state can be split into a small dimensional part, e.g. 
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(a) Outdoor Standing 



(b) Outdoor Walking 




Indoor- walking 



(c) Indoor Wall>(ing 



(a) full size image sequence (b) zoom in on the faces 

Fig. 1. Three examples of videos with significant spatial and temporal illumination variations. In the first sequence 
(outdoor-standing), the person is standing under a large tree on a very windy day. In the second one (outdoor- 
walking), the person is walking under a large tree, on a windy day. The third sequence (indoor- walking) consists 
of a person walking in a corridor past a window. 

global motion and a spatial signal (large dimensional part), e.g. illumination or shape deformation. The 
spatial signal is often w^ell modeled as being sparse in some domain. For a long sequence, its sparsity 
pattern (the support set of the sparsity basis coefficients' vector) can change over time, although the 
changes are usually slow^. 

A key example of the above problem occurs in tracking moving objects across spatially varying 
illumination changes, e.g. persons w^alking under a tree (different lighting falling on different parts of the 
face due to the leaves blocking or not blocking the sunlight and this pattern changes w^ith time as the 
leaves move); or indoor sequences w^ith variable lighting in different parts of the room, either due to the 
placement of light sources, or due to sunlight coming in through the windows that illuminates certain 
parts of the room better than others. For some examples, see Fig [T] In all of these cases, one needs to 
explicitly track the motion (small dimensional part) as well as the illumination "image" (illumination 



at each pixel in the image). As we explain in Sec |I-B[ our work is the first to demonstrate that the 
illumination image sequence for many real videos can indeed be modeled as being sparse with slowly 
changing sparsity patterns. Another example application is tracking the boundary contours of moving 
and deforming objects over time. Here again motion constitutes the small dimensional part, whereas 
deformation is the large dimensional spatial signal that can often be modeled as being Fourier sparse. 
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A. Related Work 

In recent years, starting with the seminal papers of Candes, Romberg, Tao and of Donoho 111, ||4| there 
has been a large amount of work on sparse signal recovery or what is now more commonly known as 
compressive sensing (CS). The problem of recursively recovering a time sequence of sparse signals, with 
slowly changing sparsity patterns and signal values, from linear measurements was introduced in [|5]|, |[6| 
where a solution called Kalman filtered CS (KF-CS) was proposed, and a simplification of KF-CS called 
Least Squares CS (LS-CS) was rigorously analyzed. Later work on the topic includes modified-CS Q, 
weighted |8|, modified block CS |l9l and regularized modified-CS ifTOll . Recent Bayesian approaches 
to sequential sparse estimation with time-varying supports include l[T2ll . |[T3]| , ifTSll , |[T6ll , |[T7]| . However, 
all of these works propose solution approaches for problems with linear measurement models. 

For tracking problems that need to causally estimate a time sequence of hidden states, X^, from 
nonlinear and possibly non-Gaussian measurements, Yt, the most common and efficient solution is to 
use a particle filter (PF). A PF can be used if the sequence {X^, Yt] satisfies the hidden Markov model 
assumption. The PF uses sequential importance sampling fTS'l along with a resampling step [il9i to obtain 
a sequential Monte Carlo estimate of the posterior distribution, 7r^|^(xt) := fxt\Y^.t{^t\yi'.t)^ of the state 
Xt conditioned on all observations up to the current time, Yi-t. 

In the problem that we study, part of the state vector is a discrete spatial signal and hence very 
high dimensional. As a result, in this case, the original PF lfT9l will require too many particles for 
accurate tracking and hence becomes impractical to use. As explained in ll20ll , the same is essentially 
true for most existing PF algorithms. Some of the efficient PFs such as PF-Doucet |[T8l , Gaussian PF 
II2TII . Gaussian sum filters or Gaussian sum PF [22] also cannot be used for the following reason. The 
first two implicitly assume that the posterior conditioned on the previous state, fxt\Yt,Xt-i{^t\yti xt-i), is 
unimodal or is at least unimodal most of the time. The second two assume a linear, or at least, a unimodal, 
observation model. In our problem, the observation model is nonlinear and is such that it often results in 
a multimodal observation likelihood, e.g., as explained in |20|, this happens due to background clutter for 
the illumination-motion tracking problem. If, in addition, the state transition prior of the small dimensional 
state, e.g., the motion states, is broad, which is often the case, it will result in fxt\Yt,Xt-i{^t\yt^xt-i) 
being multimodal. This fact is explained in [20] for the illumination-motion problem. Moreover, if the 
nonlinearity is such that the state to observation mapping is not differentiable, then one cannot even find 
the mode of fxt\Yt,Xt-i{^t\ytiXt-i) and hence cannot even implement PF-Doucet. This is again true 
for the illumination-motion problem. Frequently multimodal observation likelihoods and the above non- 
differentiability also mean that the extended Kalman filter, the unscented Kalman filter, the interacting 
multiple mode filter or Gaussian mixture filters cannot be used [i23i . 
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Rao-Blackwellized PF (RB-PF) El, El and PF with posterior mode tracking (PF-MT) algorithm 
ll26ll are two possible solutions for large dimensional tracking problems. RB-PF requires that conditioned 
on the small dimensional state vector, the state space model be linear and Gaussian. PF-MT relaxes 
this and only requires that, conditioned on the previous state and a small dimensional state vector, the 
posterior of the rest of the state vector (large dimensional part) be unimodal most of the time. However, 
neither RB-PF nor PF-MT exploits the sparsity of the spatial signal to be tracked. The same is true for 
more recent works on large dimensional tracking E3, EU, ED, ll30ll , as well as for works that introduce 
efficient resampling strategies |[3T1l , |[32ll . 

Other recent works in visual tracking that utilize sparsity in some fashion include |[33]| , O, |[35]| , 
EH, EH- The work of |[33]| introduces the use of sparse kernel density estimation techniques to 
obtain a kernel density estimate for the prior and posterior densities estimated by the PF at each time. 
|[34l uses sparsity for multi-task tracking by combining ideas from multi-task learning, particle filtering 
and ii minimization. The work of [|35ll projects the image into a lower dimensional space using a random 
measurement matrix and uses these in a Bayesian classifier for separating the target from the background 
at each time. The works of ESI, ll37l use a PF for target tracking, while using sparse recovery to obtain 
the best template out of a class of templates obtained from training data, target particles from previous 
time and trivial templates (identity matrix designed for occlusions and corruptions). The work of |[38]| 
uses dynamic group sparsity for robust and fast tracking. 

B. Our Contribution 

With the exception of |[3, ||6|, El, |[TOll , none of the other works discussed above exploits the fact 
that, in most large dimensional tracking problems, at any given time, the large dimensional state vector 
is usually a spatial signal that is sparse in some dictionary/basis, and over time, its sparsity pattern can 
change, but the changes are usually quite slow. For example, as shown in [6l, El, this is true for the 
wavelet coefficients of various dynamic MRI sequences. In this work, we show that the above is also 
true for the illumination image sequence. 
1) We exploit the above fact to propose a PF based tracking algorithm called Particle Filtered Modified- 
CS or PaFiMoCS. Unlike |[5l, |[6l. El, ifTOl which only solve the linear measurements' model case, 
PaFiMoCS is designed for tracking problems with highly nonlinear, and possibly non-Gaussian, 
observation models that result in frequently multimodal observation likelihoods. Many visual tracking 
problems, e.g. tracking moving objects across spatially varying illumination change, fit into this 
category. PaFiMoCS is inspired by RB-PF and PF-MT. Its key idea is to importance sample from 
the state transition prior for the small dimensional state vector, while replacing importance sampling 
by slow sparsity pattern change constrained posterior mode tracking for recovering the sparse spatial 
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signal. We introduce simple, but widely applicable, priors for modeling state transitions of sparse 
signal sequences with changing sparsity patterns that result in an efficient PaFiMoCS algorithm. 

2) In many recent works fS9l . BOll , |[20ll , the illumination image is often represented using the first 
few lowest order Legendre polynomials. However, our experiments with the dictionary of Legendre 
polynomials (henceforth referred to as the Legendre dictionary) are the first to demonstrate that for 
many video sequences with significant illumination variations, the illumination image is approxi- 
mately sparse in this dictionary and, in fact, its sparsity pattern includes many of the higher order 
Legendre polynomials, and may not include all the lower order ones. Moreover, over time, usually 
the sparsity pattern changes quite slowly [see Sec |IV-C| )]. 

3) We show how to design PaFiMoCS for tracking moving objects across spatially varying illumina- 
tion changes. Experiments on both simulated data as well as on real videos involving significant 
illumination changes demonstrate the superiority of the proposed algorithms. 

For tracking moving objects across spatially varying illumination changes, we use a template-based 
model taken from ll40i . Il20ll with a three-dimensional motion model, that only models x-y translation and 
scale. We use this because it is simple to use and to explain our key ideas. However, we should point out 
that the proposed algorithm can very easily be adapted to other representations of the target e.g. feature 
based approaches. A similar approach can be also be developed to jointly handle appearance change due 
to illumination as well as other factors like 3D pose change, by using the more sophisticated models of 
recent work flTll . ll44ll . ll45]| . Similarly, illumination can also be represented using other parameterizations 
such as those proposed in ll46ll. BTll. BSll, ||49]|. 

C. Paper Organization 

The rest of the paper is organized as follows. We give the notation and problem formulation in Sec 
[ll| Sec |II-C| introduces simple but widely applicable priors for modeling state transitions of sparse 



signal sequences with changing sparsity patterns. We develop the PaFiMoCS algorithms in Sec III 



The illumination-motion tracking problem and PaFiMoCS algorithms for it are described in Sec IV 



Experimental results, both on simulated and real video sequences, are given in Sec |Vj Conclusions and 
future work are discussed in Sec |Vll 

II. Notation, Problem Formulation and State Transition Models 

A. Notation 

The probability density function (PDF) of a random variable (r.v.) Y is denoted by while the 

conditional PDF of of r.v. Y given another r.v. X is denoted by k)- The notations E[y] and E[y |X] 

are used to denote the expectation of Y and the conditional expectation of Y given X respectively. The 
subscript t denotes the discrete time index. The notation Zt f{z) means that the sequence of r.v.'s 
Zi, ^2, . . . ^t, ^t+i, ... are mutually independent and identically distributed (i.i.d.) with PDF f{z). If the 
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sequence of ZfS are discrete r.v.'s then the same convention applies with PDF replaced by probability 
mass function (PMF). If Zt is an n length vector, the notation {Zt)i f{z) means that {Zt)i's for all 
i = 1, 2, . . . n and for alH = 1, 2, . . . , t + 1, . . . are i.i.d. with PDF or PMF f{z). 

The notation Af{a] /i, S) denotes the value of a Gaussian PDF with mean fi and covariance matrix 
S computed at a. The notation X ^ M{iJi^ S) means that the r.v. X is Gaussian distributed with mean 
fi and covariance matrix H. Similarly, the notation Unif(a; ci, C2) denotes the value of a uniform PDF 
defined over [ci, C2] computed at a while X ^ Unif(ci, C2) means that the r.v. X is uniformly distributed 
in the interval [ci,C2]. The notation S ^ Ber(T,p) means that the set S many contain any element of 
the set T with probability p (and may not contain it with probability 1— p) independent of all the other 
elements of T. 

The notation \\h\\k is used to denote the Ik norm of vector h. For any set T and vector 6, (6)^ is used 
to denote a sub- vector containing the elements of h with indices in T. For a matrix A, {A)t denotes the 
sub-matrix obtained by extracting columns of A with indices in T. We denote the complement set as 
i.e., := {i : i ^ T}. The symbols U and fl denote set-union and set-intersection respectively and the 
symbol \ denotes the set difference, i.e. Ti \ T2 := Ti fl . For a set T, |r| denote the cardinality of a 
set, but for a scalar x, \x\ denotes the magnitude of x. 

The notation vec(.) denotes the vectorization operation, it operates on a m x n matrix to give a vector 
of size mn by cascading the rows. The Hadamard product (the operation in MATLAB) is denoted 
by 0. The function round(Z) operates element-wise on a vector or matrix Z to output a vector or matrix 
with integer entries closest to the corresponding elements of the vector or matrix. We use I to denote the 
identity matrix and we use 1 or to denote vectors with all entries as 1 or respectively. M ^ denotes 
the transpose of a matrix M. 

B. Problem Formulation 

The goal of this work is to develop algorithms to recursively recover a time sequence of states, Xt, 
from noise-corrupted and nonlinear measurements, Yt, when the state Xt can be split into two parts, a 
large dimensional part, L^, and a small dimensional part, Ut, with the following properties 

1) is a discrete spatial signal, that is sparse in some known dictionary, and 

2) the sparsity pattern of Lt changes slowly over time. 
Mathematically, this means the following. The state Xt can be split as 

Ut 

Xt = 

Lt 
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where {Ut)n^xi is a small dimensional state vector and {Lt)nixi, with ni » Uu, is a discrete spatial 
signal that is sparse in some known dictionary, i.e. 

Lt = (1) 

where (A^)^^xi is a sparse vector with support set Tt, i.e. 

T, := support(A,) = {j : (A,), ^ 0}. (2) 

The ni x riA dictionary matrix $ can be tall, square or fat. Often in video applications, ni is very large, 
and hence for computational reasons, one uses a tall dictionary matrix $. 

Notice that if Tt and (At)Tt are known, then Lt is known. Thus, one can as well let the state vector 
be Xt — \Uj ^T^ ^ {Kt)TjY or for simplicity, just Xt — \Uj ^TJ ^ hJY - ^^^^ ^^^^ definition of 
the state vector in the rest of this paper 

The m dimensional observation vector, Yt, satisfies 

Yt := h{Ut, Lt) + Zt, Zt fz{z) (3) 

i.e. the observation noise, Zt, is independent and identically distributed (i.i.d.) over time, with PDF at 
any time given by fz{z). In many situations, this is Gaussian. However, often to deal with outliers, 
one models Zt as a mixture of two Gaussian PDF's, one which has small variance and large mixture 
weight and the second with large variance but small mixture weight. For the above model, the observation 
likelihood, OL(Ut^Lt), can be written as 

OL{Ut,Lt) fY4u,,LAyt\Ut,Lt) = fz{Yt - h{Ut,Lt)) (4) 

More generally, sometimes the observation model can only be specified implicitly, i.e. it can only be 
written in the form 

g{Yt,Ut,Lt) = Zt, Zt'-^- fz{z) (5) 

As we see in Sec [TV} this is the case for the illumination-motion tracking problem. For ([5]), the observation 
likelihood, OL{Ut^Lt), becomes 

OL{Ut,Lt) = fz{9{Yt,Ut,Lt)) (6) 
Notice that ([3]) is a special case of ^ with giYt^Ut^Lt) = Yt — h{Ut^ Lt). 
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C. State Transition Models 

In the absence of very specific model information, one can adopt the following simple state transition 
models. These can be used to impose slow sparsity pattern change as well as slow signal value change. 

We assume the following simple support change model for Tf. 

Tt = Tt-iUAt\Ru where 
At Ber(rti,Pa) 

Rt Ber(r,_i,p,) (7) 

Here At denotes the set added to the support at time t, while Rt denotes the set that is removed 
from the support at time t. Each of them is i.i.d. over time. Also, given T^-i, At and Rt are assumed 
to be independent of each other. Since At C T^_^, and Rt Q thus At and Tt-i are disjoint and 

\Tt\ = \Tt-i\-\Rt\ + \At\.Thus,E[\Tt\] = E[\Tt-i\]+E[^^^^^ notice that, if 5 - Ber(r,p), 

then E[S\T] = \T\p and so E[S] = E[E[S\T]] = E[\T\]p. Thus, E[\At\] = {nx - E[\Tt-i\])pa and 
E[\Rt\]=E[\Tt-i\]pr. 

In most applications, it is valid to assume that the expected support size remains constant, i.e. E[|rt|] = 
E[|rt_i|] = s. This is ensured by setting pr = {nx — s)pa/s so that E[|At|] = E[|i?t|]. Also, slow support 
change means that E[|i2t|] = E[|At|] is small compared to E[|rt|] = 5. This is ensured by picking pa to 
be small compared with s/{nx — s). 

In the absence any of other model information, we assume the following linear Gaussian random walk 
model on (At)^;: 

(At)T, = (At_i)T, + Kt)T,, Kt)T/'^^'Ar(0,af/) 

(A,)tc = (8) 
Similarly, in the absence of any other specific information, we also assume a similar model on Ut, i.e. 

Ut = Ut-i + vu^t. Vu^t''^'M{Q,T.u) (9) 

If the only thing that is known is that the values of {^^t)Tt Ut change slowly, then the above linear 
Gaussian random walk model is the most appropriate one. However, as far as the proposed algorithms 
are concerned, they are also applicable with minor changes for the case where {^^t)Tt — q{M-i^ ^i,t) for 
any arbitrary but known function q{.). 

With the above model, the state transition prior, fxt\Xt-i{^t\^t-i)^ corresponding to the above state 
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transition models can be written as follows. Recall that Xt = [Ut^Tt^At]. 

SW{XIXI_,) f^^^^^_^{Xl\Xl_,) 

= STP(r;;r/_i)STP(A^;A^_i,r;) STP(t/,^t/ti), where (10) 

STP(r/;T/_i) Pr(T, = T/|r,_i = r/_i) 

= Pr(A, = (T/ \ T/_i)|T,_i = TU.Rt = (T/_i \ r/)|T,_i = T/_i) 

= AA((ADT^(AU)T^^?/) (12) 

= AA(t/,^t/ti,5^u) (13) 



In the above, ^ follows using the following facts: (i) At = Tt\ Tt-i and Rt = T^_i \ Tt\ (ii) if 
5 - Ber(r,p), then Pr{S = S^IT = T') = - and (iii) At and i?t are independent 

given Tt-i. Thus, from (R, Pr(At = A^|Tt_i = T') = pL^''(l - Pa)'^^^^''"'^ ' and similarly, Pr{Rt = 



i?^|rt_i = T^) = Also, (Il2|) and (llSl) follow directly from M and M respectively. 



III. Particle Filtered Modified-CS 
As explained in the introduction, most existing PF algorithms cannot be used for our problem, since 

we would like to deal with (a) multimodal observation likelihoods and with (b) the state consisting of a 

sparse spatial signal with unknown and slowly changing sparsity patterns, in addition to another small 



dimensional vector. Below, in Sec III-A[ we provide a quick review of particle filter with mode tracker 



(PF-MT) and discuss its limitations. Next, in Sec |III-B[ we explain how to address these limitations. 
The result is our proposed algorithm called particle filtered modified-CS (PaFiMoCS). PaFiMoCS-slow- 



support-change is described in Sec III-C In Sec III-D[ we give one set of the assumptions under which the 



cost function to be minimized by PaFiMoCS is convex. We discuss the computational cost of PaFiMoCS 



and the potential for its parallel implementation in Sec |III-E| Two approaches to deal with outliers, e.g. 
occlusions, are described in Sec |III-F[ 

A. A review of particle filter with mode tracker (PF-MT) and its limitations 

The PF-MT algorithm |[26ll splits the state vector Xt into Xt = [Xj^^^Xj^^Y where Xt^s denotes the 
coefficients of a small dimensional state vector, which can change significantly over time, while Xt^r 
refers to the rest of the states which usually change much more slowly over time. PF-MT importance 
samples only on X^^^, while replacing importance sampling by deterministic posterior Mode Tracking 
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Algorithm 1 PF-MT: Particle Filter with posterior Mode Tracker 
For alH > do 

1) For each particle z = 1, 2, . . . ripf: Importance sample Ut from its prior: ^ A/^(0, T^u) 

2) For each particle i = 1,2, . . .ripf: Mode track At, i.e. compute the mode of the posterior of A^ 
conditioned on and U^, i.e. compute AJ as the solution of 

IIA — A^ IP 

m^inC(A) where C{A) := - logOL(t/^ $A) + 2af~^ ^^^^ 

where OL(.) is defined in 

3) For each particle z = 1, 2, . . . ripf: Compute the weights as follows. 

wi oc wU OL{Ui, $A) Af{Ai; Aj.^, afl) 

4) Resample and reset weights to I /ripf. Increment t and go to step 1. 



(MT) for Xt^r- Thus the importance sampling dimension is equal to only the dimension of Xt^s which is 
much smaller than that of Xf. The importance sampling dimension is what decides the effective particle 
size and thus PF-MT helps to significantly reduce the number of particles needed for accurate tracking 
in a large dimensional problem. PF-MT implicitly assumes that (i) the posterior of Xt^r conditioned on 
the previous state, Xt-i and on Xt^s ("conditional posterior"), i.e. fx,^^\Xt-i,Xt,sXti^t,r\Xt-i, Xt^s.Yt), 
is unimodal most of the time; and that (ii) it is also narrow enough. Under these two assumptions, it can 
be argued that any sample from the conditional posterior is close to its mode with high probability ll26l 
Theorem 2]. 

PF-MT can be applied to our problem if we replace ^ and ([S]) by ([S]) with = [1, 2 . . . nx], i.e. we 
do not use the sparsity of A^. Then, with Xt^s = Ut and Xt^r = A^, the PF-MT algorithm is as given in 
Algorithm [1} Notice that the conditional posterior in this case satisfies 

fA,lx,.,,u.,Y.i^t\XU,Ui,Yt)^OLiUi,^At) MiAt;AU,afl). 

where OL(.) is defined in Thus, the cost function to be minimized in the MT step is given by (14). 

However, since PF-MT does not exploit the sparsity or slow sparsity pattern change of A^, with 
probability one, it results in a dense solution for A^, i.e. the energy gets distributed among all components 
of Af. This becomes a problem in applications where A^ is indeed well approximated by a sparse vector 
with changing sparsity patterns. An alternative could be to assume ([8]) on a selected fixed subset of A^, 
i.e. fix Tt — Tq. For example, if $ is the Fourier basis or a Legendre dictionary, one would pick the 
first few components as the set Tq. This was done in |20| for illumination. This approach works if most 
energy of Lt does indeed lie in the lower frequency (or lower order Legendre) components, but fails if 
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there are different types of high-frequency (higher order Legendre) spatial variations in Lt over time[^ 
For many of the video sequences we experimented with for motion tracking across illumination change, 
this was indeed true, i.e. higher order Legendre polynomials were part of the Legendre support set of the 
illumination image [see Fig[3|, and as a result PF-MT implemented this way also failed [see Figs|5j[6|. 

B. PaFiMoCS: Particle Filtered Modified-CS 

To address the above limitation, one can utilize the sparsity and slow sparsity pattern change of the 
large dimensional state vector, Lt, in a PF-MT type framework as follows. The key idea is to add a term 
motivated by Modified-CS Q in the mode tracking optimization step and make corresponding changes 
in the other steps. We proceed as follows. We let Xt^s = \Ut^Tt] and Xt^r = A^. In the importance 
sampling step, we sample Ul and from their state transition priors given in Sec 



II-C 



In the cost 



function that we minimize for the mode tracking step, we add a term of the form ||A7^c||i with T — Tl, 
i.e. it computes as the solution of minA C{X) where 

C(A) :=-logOL(t/^$A) + /3^^^— ^^i^+7||A(r,).||i (15) 



where OL(.) is defined in ([6|). Solving \\5\ is a tractable approximation to trying to find the vector K\ 
that is sparsest outside the set (i.e. the vector with the smallest number of new support additions 
to T/) among all vectors A that satisfy the observation model constraint (often referred to as the data 



constraint) and are "close enough" to the previous estimate, Thus solving (15) ensures that 

the support of the solution, A^, does not change too much w.r.t. the predicted support particle T/. The 
larger the value of 7, the smaller will be the support change. Notice that ( [T5] ) can also be interpreted as 
an adaptation of the regularized modified-CS idea which was originally introduced for linear problems 
with slow sparsity pattern and signal value change in |10|. 

A second change that we have w.r.t. the original PF-MT idea is that we threshold on K\ in order to 
get an updated estimate of the support We compute it as := {j : |(A^)j| > a} for a small nonzero 
threshold a. Finally, as in PF-MT, the weighting step multiplies the previous weight by a product of the 
observation likelihood and the state transition prior of Xt^r = A^. We summarize the resulting algorithm 
in Algorithm |2] We refer to it as Particle Filtered Modified-CS (PaFiMoCS). In situations where A^ is 
indeed well approximated by a sparse vector with changing sparsity patterns, this significantly improves 
tracking performance as compared to PF-MT as well as other PF algorithms. We demonstrate this for 
the illumination-motion tracking problem in Sec. [Vj 

^We show a few 2D Legendre polynomials (images of Pk{i,j) defined in ( [21I for a few values of k) in Fig [2] As can be 
seen, higher order Legendre polynomials roughly correspond to higher frequency spatial variations of intensity. 
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Algorithm 2 PaFiMoCS: Particle Filtered Modified-CS 



Input: Yt 

Output: UlTi, Al wl 

Parameters: (algorithm) Upf, a^^^l3, (model) T^u^crf ^PaiPn fz{z) 
For alH > do 

1) For each particle i = 1,2, ...np/: Importance sample Ut from its state transition prior: Ul ^ 

2) For each particle i = 1,2, ...rip/: Importance sample Tf from its state transition prior: = 
r/_i UAi\Ri where - Ber((T/_i)^p«) and i?^ - Ber(r/_i,p^). 

3) For each particle i = 1,2,... ripf'. Mode track with imposing slow sparsity pattern change, i.e. 



compute Al as the solution of (15) with OL(.) as defined in ([6|). 

4) For each particle i = 1, 2, . . . ripf: Update as 

r/:= {j:\{Ai)j\>a} 

5) For each particle i = 1, 2, . . . ripf'. Compute the weights as follows 

wl oc wU OUUl^Al) STP(Aj; Aj_i,7^ 



where OL(.) is defined in ([6|) and STP(A^; A^_i, T/) is defined in ([12|). 
6) Resample and reset weights to I /ripf. Increment t and go to step 1. 



We should point out here, that from the Bayesian perspective, in ( [T5| ), the multiplier /3 should be one 
and 7 should be chosen by assuming that the elements of {At)T^ are i.i.d. Laplace distributed and the 
Laplace parameter can be estimated by maximum likelihood estimation. For details, see fJ^, Appendix C]. 
However, it is well known that in solving sparse recovery problems, the weights given by the Bayesian 
perspective are not always the best ones to use. A more practically useful approach to selecting these is 
explained in [10, Section V-B]. 

C. PaFiMoCS-slow -support-change: PaFiMoCS for large sized problems with slow support changes 

For certain problems with very large sized spatial signals, L^, the support size of its sparse coefficients 
vector. At, can also be very large. In these situations, if we keep Tt as part of the importance sampling 
state 5, it will require a very large number of particles, thus making the algorithm impractical. However, 
if the support changes slowly enough, then we can include Tt as part of Xt^r^ i-^. we let Xt^s = Ut and 
Xt^r = \Tt^At]. With this, the mode tracking step would ideally have to compute A^, A\ and R\ by 
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solving 



min C(A,A,R) where 
A,A,R ^ ' ' ^ 



J|(A - AJ_;l)t;_J|2 |.|, Pa Pr 



C(A, A, R) := - log OL{Ul $A) + ^- ; , ^'"'^ - | A| log - \R\ log (16) 

2af I- Pa 1 - 

and setting = T^^_^ U \ But the above minimization will require a brute force approach of 
checking all possible sets A and R and will thus have complexity that is exponential in the support 
change size. Thus, it cannot be solved in any reasonable time. However, we can instead compute A J by 
solving minA C{A) where 

C(A) := - logOL(^/, $A) + /3 "^"^ ~ ^2af^^~' + V,-.)- IK d^) 
and then threshold on AJ to get the current support particle T^. Since pa < 0.5 and pr < 0.5, the last two 



terms of (16) are increasing functions of l^l and \R\. If the last term of (16) were ignored, doing the 
above can be interpreted as its convex relaxation: it helps to find the vector AJ with the smallest number 
of support additions to r^^_i, i.e. the smallest l^^l, while also keeping the first two terms small. If the 



last term of ([16]) is not ignored, then it is not clear what its convex relaxation would be. 

Since the support set Tt is now a part of Xt^r, we also need to include a term proportional to its state 
transition prior in the weighting step, i.e. we need to also multiply by ( [TT] ) in the weighting step. We 
summarize the resulting algorithm in Algorithm [3] We refer to it as PaFiMoCS-slow-support-change. We 
should note that this works only as long as the support changes at all times are slow enough. 

D. Convexity of the Mode Tracking Cost Function 

Consider the cost function C(A) that we need to minimize in either of the above PaFiMoCS algorithms 
(Algorithm [3] or [2]). If this is convex, any minimizer is a global minimizer. Also, there exist a large number 
of efficient algorithms, such as interior point methods as well as other more recent efficient algorithms, 
for minimizing convex functions. There are also multiple software packages such as CVX (CVX: Matlab 
software for disciplined convex programming, |http://cvxr.com/cvx| ), that contain efficient implementations 
of these algorithms. If the cost function is not convex, as long as it is differentiable, one can still try 
to use the convex solvers, but one will only find a local minimizer that is closest to the initial guess 
provided. 

One simple set of sufficient conditions for C{K) to be convex, that are also often satisfied in practice, 
are as follows. 

1) Zt is Gaussian distributed, i.e. fz{z) is a Gaussian PDF and 

^This follows by including the negative logarithm of state transition prior of Tt given in |ll| in the cost. 
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Algorithm 3 PaFiMoCS-slow-support-change: PaFiMoCS for slow support changes 
Input: y^, Output: UlTi.Ai wi 

Parameters: (algorithm) a,7,/3, (model) Y^u^^'i ^Pa^Pr^ fz{z) 
For alH > do 

1) For each particle z = 1, 2, . . . n^/: Importance sample Ut from its prior: Ul ^ A/'(0, T^u) 

2) For each particle i = 1,2,... n^/: Mode track A^, Tt with imposing slow sparsity pattern change, 



i.e. compute A^ as the solution of (17) with OL(.) as defined in ([6|). Compute by thresholding 
on AJ, i.e. compute 

Ti:= {i:|(Aj),|>a} 
3) For each particle z = 1, 2, . . . n^/: Compute the weights as follows 

^(X«;i_iOL(C/;,$Aj) STP(Aj;Aj_i,r/) STP(r/;r;_i) 



where 0L(.) is defined in (|6|), STP(AJ; AJ_i, T/) is defined in ([12|| and STP(rt^r/_i) is defined in 
4) Resample and reset weights to l/ripf. Increment t and go to step 1. 



2) The observation model is such that g{Yt^ ^7^^ $A) is an affine function of A, i.e. 

g{Yt, Ul $A) = gu^Yu Ui)^A + ^.,2(>t, ^t^) (18) 



The illumination-motion tracking problem that we describe in Sec |IV| is an example of a problem where 
the above assumptions hold. 

A convex cost function ensures that any minimizer that we find is a global minimizer. Also, under 
certain extra assumptions, the minimizer will be unique. For example, if the above two assumptions hold, 
and if the matrix B := gu^i(Yt^U^)^ satisfies the conditions imposed on the measurement matrix in ifTOl 
Theorem 1], the minimizer will be unique. 

E. Computational Cost and Parallel Implementation 

Compared with most other PF algorithms (except PF-MT, RB-PF or some others), PaFiMoCS has the 
advantage that it requires much fewer number of particles for tracking a large dimensional spatial signal 
and other states over time. However, like PF-MT, PaFiMoCS also needs to solve a convex optimization 
problem for each particle. This can be make its implementation speed quite slow. However, notice that, 
like PF-MT, PaFiMoCS is ideally suited for a parallel implementation since the convex optimization 
needs to be solved for each particle independently of all the others. A parallel implementation would 
easily enable PaFiMoCS to run in real-time. 
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F. Dealing with Outliers 

Outliers in the observation noise occasionally occur in almost all practical tracking problems. An 
outlier is a very large value of the noise that occurs with very small probability. However since it is a 
very large value, even one outlier occurrence can cause the tracker to lose track and thus any practical 
tracker needs to be able to deal with outliers. In computer vision, the most common reason for an outlier 
is an occlusion. 

There are two commonly used ways to model outliers. The traditional approach [ [50l is to let the 
observation noise Zt be a Gaussian mixture with one large variance Gaussian and one small variance 
one, i.e. 

n 

Zt fz{z) -\[[{^- PoutWiz,; 0, a^) + PoutAA(z,; 0, a^.j 

i=l 



with cTq^^ » cr^, e.g. cTq^^ = lOOOa^ and with the outlier probability pout being very small. Models similar 
to this one were used in |26 | and in |20|. However, one main disadvantage of using this is that it makes 
the cost function that we need to minimize non-convex. 

A more recent and also more efficient approach to model outliers is treat them as sparse vectors iBTIl . 
Thus, one lets 

Zt = Zt^g + Ot 

where Zt^g ^ A/'(0, a^/) is the usual Gaussian noise while Ot is an unknown sparse vector. With this 
model, in the mode tracking step of Algorithm [3] or Algorithm [2j one solves 

min C(A,0), where 

rf^n^ ll^(>^t,^^^A) - 0||i , ||(A-AU)T|li , ^,|, . ^n^nw 



Notice that if g{Yt, ^7^^ $A]) is an affine function of A, i.e. if it satisfies (18), then this cost function is 
still convex. 

IV. Visual Tracking across Spatially varying Illumination Changes 
In this section, we show how visual tracking across spatially varying illumination change is an example 

of the general problem studied here and how to design PaFiMoCS for it. We give the observation model 



for this problem in Sec IV-A below, followed by the state transition model in Sec IV-B In Sec IV-C 



we demonstrate that for many video sequences, the illumination image is indeed sparse in the Legendre 
dictionary and most of the time its sparsity pattern does change slowly over time. The PaFiMoCS and 



PaFiMoCS-support algorithms for this problem are summarized in Sec |IV-D 
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k 



k 



= 20 



A: = 23 



Fig. 2. We show images of Pk{i^j) defined in (21) for a few different values of k. Notice that higher values 
of k correspond to higher spatial frequency of intensity change. Also, even values of k correspond to x direction 
variation (constant along y axis) whereas odd values of k corresponds to y direction variation. 



A. Observation Model 

The state in this case consists of the n^^ x 1 motion state, Ut, w^hich is the small dimensional part, 
and the n/ x 1 illumination "image" (w^ritten as 1-D vector), Lt. In this paper, w^e use a template-based 
tracking framework, similar to the one in ll20l , with a simple three-dimensional motion model, that only 
models x-y translation and scale, i.e. Ut = [u^^ uf, st]^ where st refers to scale and uf and uf refer to x 
and y translation. Thus Uu = 3. The illumination image is represented in the Legendre dictionary. Thus, 
our final state vector is Xf = [U^ ^ Aj]^ where Ut is the x 1 motion state, is the x 1 
Legendre coefficients' vector of illumination and Tt is the support set of A^. 

The observation model used in this work is taken from ll20ll . We repeat it here for completeness. The 
initial template is denoted by Iq. We use ROl{Ut) to denote the region-of-interest (ROI) in the current 
image, i.e. it contains the pixel locations of the template in the current frame. The total number of pixels 
in Iq is ni and thus the size of the set ROl(Ut) is also n/. The total number of pixels in the entire 
image, y^, is m > n^. At time t, ROl{Ut) is obtained by scaling and translating the pixel locations of 
the original template Iq, i.e. by using ( [20| ). The illumination image is represented using the Legendre 
dictionary defined in ( [2T] ). The pixel intensities of the pixels in the ROI are modeled as being equal to 
those of the initial template plus the initial template scaled pixel-wise by the illumination image plus 
some randomness that is modeled as Gaussian noise (in the absence of occlusion). The pixels outside the 
ROI, i.e. those in ROl(Uty, are assumed to be due to clutter and we simply model them as being i.i.d. 
uniformly distributed between zero and 255. This model can be mathematically specified as follows. 

Yt{ROl{Ut)) = vec(/o) + (vec(/o)0L,) + ^t = vec(/o) + $A, + Z,, -AA(0,a^/), 

YtiROliUtY) = Zt^c. (^t,c)/-^'-Unif(0,255) (19) 



January 9, 2013 



DRAFT 



17 



where "i.i.d." means i.i.d. spatially and over time; 

ROl{Ut) := {round(Jit/t + lio),round(J2^7t + ijo)}, where 
Ji := [1 (io - lio)], J 2 := [0 1 (jo - ijo)], 



(20) 



io and jo contain the x coordinates and the y coordinates respectively of the initial template /q, io 
mean(io) = jj J2kLi[io]k, jo := mean(jo) = jj J2^=i[jo\k denote the x and y location of the centroid 
of /q (since the template is of size ni, io and jo are also ni length vectors and Ji, J2 are x 3 matrices); 
and 



[vec(/o Po), vec(/o P2d)]i where 



1 if /c = 

Pk{iJ) = {p^{i) if fc = l,3,5,...(2d-l) (21) 

2 

pk{j) if /c = 2,4,6, .2d 

and pk{-) is the Legendre polynomial of k^^ order. Thus, $ is an ni x matrix with nx = 2(i + 1. 
We show images of Pk{i^j) for a few values of k in Fig [2] As can be seen, higher order Legendre 
polynomials roughly correspond to higher frequency spatial variations of intensity. In our experiments, 
we used d = 20, so that nx = 41, while the template size, ni, was much larger. Thus, $ was a tall 
matrix. 

' y,(ROI(t/0)-vec(/o)-$A, 



The above model is of the form (5) with g{Yt, Ut, ^At 



0L(.) is of the form ^ and can be 



explicitly written as 



Yt(ROl{Ut 



OL{Uu $A,) = M{[Yt{m\{Ut))] - vec(/o) - $At; 0, all) 



V255 



m—ni 



Thus, 



(22) 



Recall that m is the size of the entire image Yf while ni is the size of the template. 
B. System Model 

In the absence of any extra information about the motion, we assume a simple linear Gaussian random 
walk model on the motion state Ut, i.e. we assume that Ut satisfies ([9]) with J^u being a diagonal matrix. 



As we show in Sec |IV-C| below, the Legendre coefficients vector for illumination, A^, is an approximately 
sparse vector with support that usually changes slowly over time. Hence, the models given in Sec |ll| apply 
for A^ as well: its support, Tf, satisfies (|7]) and A^ satisfies ([8]). Thus, the corresponding state transition 
priors are also the same as those given there. 
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2 3 4 5 6 / 8 9 10 11 12 13 U 15 16 1/ 'l 2 3 4 5 6 / 8 3 10 11 12 13 H 15 16 1/ 18 13 20 'l 2 3 4 5 6 / 8 9 10 11 12 13 U 15 16 1/ 18 19 20 



outdoor- Standing outdoor- walking indoor- walking 

(c) entries in the support set at various times (shaded in black) 

Fig. 3. Sparsity and slow sparsity pattern change of the illumination image sequence for videos of Fig [T] in the 
Legendre dictionary. 

C. Verifying Illumination Image Sparsity and Slow Sparsity Pattern Change 

We used the video sequences shown in Fig [T] to study the sparsity and sparsity pattern change of 
illumination images over time in the Legendre dictionary and in the Fourier basis. This was done as 
follows. 

1) We took a 20 frame sub-sequence of each video. In each frame, a rectangle was hand-marked around 
the person's face. The rectangular region from the first frame served as the initial template Iq. 

2) Let It denote the rectangular region containing the person's face at time t. This was first resized 
to make it the same size as Iq. We then computed the maximum likelihood estimate of for the 
model vec(/t) = vec(/o) + $At + Zt where Zt A/'(0, afl) as At = ($^$)"^$^vec(/t - Iq). 
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(a) support size 



outdoor- standing 




outdoor-walking 
(b) support change sizes 




Fig. 4. Support size and support change sizes of the illumination image sequence for the three videos of Fig [T] in 
the Fourier basis. 



3) The vector computed above is not exactly sparse, but approximately so. We let Tt be its ^^/o- 
energy support |7|, i.e. Tt := {j : (A^)^ > a} w^here a is the largest real number for w^hich 
Tt contains at least 99% of the signal energy. This is computed as follow^s: sort elements of At in 
decreasing order of magnitude and keep adding elements to Tt until X]jGTt(^Oj — ^-^^ 
We refer to a as the 99%-energy threshold. 



It I 

4) In Fig 3(a) w^e plot the support size normalized by the length of A^, i.e. — for all the three 

Tlx 



sequences. 



5) In Fig |3(b)[ w^e plot the number of additions to, and removals from, the support normalized by the 
support size, i.e. w^e plot ^^']tIC^ ^^^T^* 



6) In Fig |3(c)[ w^e show^ w^hich Legendre polynomials are contained in the support set Tt at various 
times t in each of the video sequences. We shade in black the squares corresponding to indices that 
are contained in Tt, w^hile leaving blank the indices that are not in Tt. 
We first did the above for the three video sequences show^n in Fig[TJ u^ith using the Legendre dictionary, 
i.e. $ w^as computed using ( [2T] ) w^ith d = 20. The first (outdoor-standing) is of a person standing under 
a tree on a very w^indy day. Variable lighting falls on different parts of her face because leaves block 
some of it. As the leaves move, this pattern changes w^ith time. The second (outdoor-w^alking) is of a 
person w^alking under a tree w^ith variable amounts of light falling on various parts of her face as she 
moves under the moving leaves. The third (indoor-w^alking) has a person w^alking along a corridor across 
a window with varying illumination coming through the window. The lighting is significantly more when 
he gets near the window and again lesser when he moves away from the window. Moreover, since the 
window is on the side of the person, the amount of light falling on different parts of his face is quite 
different (higher frequency spatial variations of illumination exist) and also changes with time. 
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Fig. 5. Location error plots for the outdoor-walking video. PaFiMoCS-SSC refers to PaFiMoCS- slow- support- 
change. 



As can be seen from Fig |3(a)[ for all three sequences, the support size is between 30-60% of the 
length of At, nx. Thus, is indeed approximately sparse. Moreover, except at a few^ time instants, the 
number of support changes (additions or deletions) is usually under 35% of the support size. For the 
indoor-w^alking sequence, at certain times (w^hen the person moves tow^ards the w^indow^ from a darker 
region of the corridor or vice versa), the number of support changes is much larger than this. 



From Fig |3(c)[ wq can see that the support set of all sequences does indeed contain many of the 
higher order Legendre polynomials. Polynomials up to the 16^^ order (k = 32) are present in all the 
three sequences. This explains why PF-MT run with only a 7-dimensional A^ {d = 3) fails to track these 
sequences [see Figs [5j |6} [7|. This also indicates the need for using a Legendre dictionary with d > 16. 
To allow for occasional tracking errors, we used a dictionary with d = 20 in our experiments. 

We also repeated the above experiment with $ being the Hadamard product of Iq and the discrete 
Fourier transform matrix. The support size and support change size in the Fourier basis are plotted in Fig 
|4j As can be seen, for all the videos, the support size was much larger, between 70-90% of nx. However, 
the support change size was smaller, between 10-20% of the support size. Since the support size was so 
large, we did not use the Fourier basis in our experiments. 

D. PaFiMoCS and PaFiMoCS-slow-support-change algorithms 

With the state space model given above, the PaFiMoCS algorithm of Algorithm |2] and the PaFiMoCS- 
slow-support-change algorithm of Algorithm |3] for faster support changes apply directly. In either case, 
the cost function to minimize simplifies to 



C(A) = 



|y,(ROI(C/»)) - vec( Jo)-$A||i ^ ^£(A 



a(*) l|2 



2ai 



+ 7||(AtOIIi 
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with T = in case of PaFiMoCS and T = Tl_^ in case of PaFiMoCS-slow-support-change. Also, in 



the weighting step, OL{Ul^ ^^t]) is computed using (22). 

Notice that for this problem, g{Yt^Ut^^^^t) is indeed an affine function of and the noise Zt is 
Gaussian. As a result, the above cost function is convex in A and thus easy to minimize using any of 
the standard convex solvers. We used CVX (CVX: Matlab software for disciplined convex programming. 



http://cvxr.com/cvx) in our implementations. 



In order to deal with occlusions (outliers), one can use either of the approaches described earlier in 



Sec. III-F As explained there, using the second approach will retain the convexity of the cost function 



to minimize. With using that, the cost function to minimize becomes 

r(.n^ ll>^t(RQI(^f^))-vec(/o)-$A-0||i , JI(A-Aa)T||i 

^(A,0) = + 13 +7||AtH|i +7 ll^^lli 

V. Experimental results 
We show two types of experiments. In the first experiment, we took a face template and simulated a 



video sequence according to the models specified in SecjlVj This allowed us to evaluate our algorithm and 
compare it against other PF methods for a dataset for which ground truth is available and for which one 
can generate multiple realizations of the sequence to compute the Monte Carlo based average performance. 
In the second experiment, we did comparisons on real video sequences containing significant spatial and 
temporal illumination changes. For one sequence in this experiment, we hand-marked the target's location 
in a 20-frame subsequence and treated this as "ground truth" for quantitative performance evaluation. All 
comparisons also involve comparing visual displays of estimates of the target's bounding box. 



We first discuss the results of the real video experiments in Sec V-A The results of Monte Carlo 



evaluation using multiple simulated video sequences are described in Sec V-B 



A. Results on Real Video Sequences 

We show experiments on two different video sequences. In both cases, the goal was face tracking. We 
compared our proposed algorithms with several other PF algorithms - PF-MT ll20ll (both using d = 3 
and d = 20), AuxiUary-PF ||52J (with d = 20) and PF-Gordon |^ (with d = 20). All algorithms used 60 
particles for all the videos. As explained earlier and also in ll20ll , PF-Doucet ifTSll cannot be implemented 
because the observation likelihood is not differentiable w.r.t. Ut and hence the posterior mode cannot 
be computed. PF-MT with = 3 (PF-MT-3) is exactly the algorithm used in the experiments of ll20ll ). 
Auxiliary-PF-3 and PF-Gordon-3 were already shown to fail for even simpler sequences in |[20ll , and in 



the simulation experiment that we show in Sec V-B and hence we do not repeat those comparisons here. 



In the first sequence (outdoor- walking), the person walks under a tree and as the leaves of the tree 
move, different amounts of light fall on different parts of her face resulting in high frequency spatial 
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(c) tracking results using PF-MT-3 




t = 15 t = 36 t = 41 t = 48 t = 56 

(f) tracking results using Auxiliary-PF-20 



Fig. 6. Tracking results for the outdoor- walking sequence. We show the comparison for frames 15, 36, 41, 48 and 
56 respectively. 

variation of illumination that also changes w^ith time. As a result, for many frames, many of the higher 
frequency (higher order) Legendre coefficients are also nonzero. We show^ the quantitative location error 
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comparisons in Fig [5] and the bounding box display comparisons in Fig [6j The location error (LE) is 
computed as LE = ||(^^t)[i,2] ~ (^^t) [1,2] lb where Ut contains the translation of the centroid of the hand- 
marked bounding box and its scale w.r.t. the initial template and Ut is the tracked estimate of Ut computed 
as the weighted mean of all the particles of Ut. To get the tracked bounding box shown in Fig|6| {Ut)3 
is used to scale the initial template's bounding box and {Ut)[i^2] is used to translate it. 

As can be seen from either Fig [5] or Fig [6j both PaFiMoCS and PaFiMoCS-slow-support-change are 
able to track the face in the entire sequence. For this sequence, most support changes are slow enough (see 



Fig |3(b)| ) and hence both the PaFiMoCS algorithms had similar performance. On the other hand, both PF- 
MT-20 and PF-MT-3 lose track, though the reasons are different. PF-MT-3 fails because it assumes that 
the illumination image can be accurately represented by only the first 7 Legendre polynomials (d = 3). 
However, as can be seen from Fig |3(c)[ most frames of this sequence do contain significant energy in 
the higher order Legendre coefficients. PF-MT-20 loses track because it does not exploit the sparsity or 
slow sparsity pattern change of and so, with probability one, it results in a dense solution for A^, i.e. 
the energy gets distributed among all components of A^. However, as can be seen from Fig |3(a)[ most 
of the energy of the true A^ lies in only about 50% of the 41 coefficients, while the others are zero or 
very small. Aux-PF-20 and PF-Gordon-20 fail for two reasons. The first is the same as above, they also 
do not exploit sparsity or slow sparsity pattern change. Moreover, as explained in ll20ll , 60 particles is 
too few for all of these to be tracking on a 44 dimensional space. 

The second video sequence (indoor-walking) consisted of a person walking through a corridor across 
a window. The lighting is significantly more when he gets near the window and again lesser when he 
moves away from the window. Moreover, since the window is on the side of the person, the amount 
of light falling on different parts of his face is quite different (higher frequency spatial variations of 
illumination exist) and also changes with time. As can be seen from Fig [7} PF-MT-3 and PF-MT-20 lose 
track within a few frames for reasons similar to those explained above. The illumination pattern changes 
are somewhat sudden when the person moves towards the window from a darker region of the corridor 



or vice versa, and so, as can be seen from Fig 3(b) the number of support changes at these times is 



significantly larger than in the outdoor-walking sequence. As a result PaFiMoCS-slow-support-change 
(Algorithm [3]) begins to lose track around t = 30. However, PaFiMoCS (Algorithm [2]) remains in track 
always. 

All algorithms used 60 particles for all the videos. The face template at the first frame was assumed 
known for all algorithms. For all the videos, the model parameters were set a^ T^u = diag(25, 25, 0.1), 
af = 1000, Pa = 0.06, = 1000. For PaFiMoCS, we used 7 = 0.7 and /3 = 1. These values were 

^If training data is available, these parameters can be estimated by maximum likelihood estimation. 
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(a) tracking results using PaFiMoCS- slow- support-change 

ElSllim 

(b) tracking results using PaFiMoCS 




(c) tracking results using PF-MT-3 




t = 16 t = 20 t = 23 t = 28 t = 30 

(d) tracking results using PF-MT-20 

Fig. 7. Tracking results for the indoor- walking sequence. We show the comparisons for frames 16, 20, 23, 28 and 
30. 

selected after some experimentation using the approach suggested in ITOl Section V-B]. As suggested 
in Q, the support estimation threshold a was chosen as the 99% energy threshold of the estimated 
illumination vector. For PaFiMoCS-slow-support-change, we used 7 = 0.5 and /3 = 1. A smaller value 
of 7 is needed for PaFiMoCS-slow-support-change because of the following reason. In this case we 
do not importance sample on the support. In the mode tracking step we minimize ( [TT] ) which uses the 
previous support particle T^^_i. Hence, a larger number of support additions may be needed in this case 
to get a support estimate that is close to the true T^, i.e. A(2^i may need to be less sparse. 

B. Results on Simulated Video Sequences 

A video sequence of a moving target with spatially varying illumination change was generated as 
follows. We let /q be a given face template. The vectors zq, jo contained its x and y coordinates and zq, jo 
contained the x,y coordinates of its centroid. Starting with = 0, at time time t, the motion vector 
Ut was generated according the random walk model given in ^ with T^u = diag(0.5, 0.5, 0). To keep 
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things simple, we used a zero value for the scale variance, i.e. we simulated only x-y translation of the 
template. The illumination vector was generated as follows. The initial support set Tq contained five 
uniformly randomly selected indices from [1,2,... (2(i+ 1)] with d = 20. Every five frames, the support 
set Tf was changed according to ^ with pa = 0.03, Pr = 0.216. This ensured that the expected support 
size is 5 at each time and the expected support change size is 1 at each time. The illumination vector 
on the support T^, (At)^, was generated according to the random walk model of ^ with af = 0.01 and 
initialized with (Ao)ro = 0- With Ut generated as above, the ROI for the template at time t, ROl(Ut), 



was computed using ( |20| ). The Legendre dictionary with d = 20 was computed using ( [21] ). The observed 
image Yt was then generated using (19) with = 1. As given there, the pixels outside the ROI, i.e. 
those in ROl{Uty, are assumed to be due to clutter and are modeled as being i.i.d. uniformly distributed 
between zero and 255. 

We generated 50 different video sequences using the above approach. An example sequence is shown in 



Fig 1 8 (a) I Each sequence was tracked using PaFiMoCS and PaFiMoCS-slow-support-change with d = 20 
as well as using PF-MT [20| (both using d = 3 and d = 20), AuxiUary-PF ^ (with = 3 and d = 20) 



and PF-Gordon [[T9ll (with d = 3 and d = 20). All algorithms used 100 particles. In Fig |8(b)[ we plot the 
normaUzed mean squared error (NMSE), NMSE(t) := ^'''^EiS/!||jt||A!^t''''' against time. Here Ut and 
At are computed as the weighted means of all the particles of Ut and A^ respectively. The expectation 
is computed by averaging over the 50 Monte Carlo realizations of the sequence. 

As can be seen, PaFiMoCS (Algorithm |2]) remains in track with stable and small error throughout. 
PaFiMoCS-slow-support-change (Algorithm [3]) also remains in track, but its errors are slightly larger 
because occasionally the number of support changes was large. PF-MT-3 ll20l loses track because it 
assumes that only the first 7 Legendre polynomials are sufficient to represent the illumination image. 
However, we know from our simulation that the support of the illumination vector is equally likely to 
contain any element from [1,2,... 41] (not just the first 7). On the other hand, as explained earlier, 
PF-MT-20 loses track because it assumes that A^ is a dense vector, i.e. all of its 41 components are part 
of the support at all times. Aux-PF-20 and Aux-PF-3 as well as of PF-Gordon-20 and PF-Gordon-3 lose 
track due to similar reasons as above, and also because these need many more than 100 particles to even 
track on a 10 dimensional state space. 

All algorithms used 100 particles. For PaFiMoCS, after some experimentation using the approach of 
ifTOl Section V-B], we used 7 = 0.7 and /3 = 0.4. As suggested in Q, the support estimation threshold 
a was chosen as the 99% energy threshold of the estimated illumination vector. For PaFiMoCS-slow- 
support-change, we used 7 = 0.5, /3 = 0.4, and a was again set as above. A smaller value of 7 is used 



for PaFiMoCS-slow-support-change for the same reason as that explained in Sec V-A 
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(a) a simulated video sequence 




5 10 15 20 25 30 35 40 45 50 

Time 



(b) NMSE of ([/t,At) 

Fig. 8. In the NMSE plot, PaFiMoCS-SSC refers to PaFiMoCS-slow-support-change. 

VI. Conclusions and Future Work 
In this work, we studied the problem of tracking a time sequence of sparse spatial signals with changing 

sparsity patterns, e.g. illumination, as well as other unknown states, e.g. motion states, from a sequence of 

nonlinear observations corrupted by (possibly) non-Gaussian noise. A key application where this problem 

occurs is in tracking moving objects across spatially varying illumination change. In this case, the motion 

states form the small dimensional state vector, while the illumination "image" (illumination at each pixel 

in the image) is the sparse spatial signal with slowly changing sparsity patterns. We proposed a novel 

solution approach called Particle Filtered Modified-CS (PaFiMoCS). The key idea of PaFiMoCS is to 

importance sample for the small dimensional state vector, while replacing importance sampling by slow 

sparsity pattern change constrained posterior mode tracking for recovering the sparse spatial signal. We 

studied the illumination-motion tracking problem in detail and showed how to design PaFiMoCS for it. 

Extensive experiments on both simulated as well as on real videos with significant illumination changes 

demonstrated the superiority of PaFiMoCS as compared with existing work. 

Future work will involve designing PaFiMoCS using more sophisticated pose and appearance change 

models from recent work BH . Il44ll . ||45]| . A second goal of future work will be to study other visual 

tracking applications where the above problem occurs, e.g. tracking moving and deforming objects from 

cluttered and/or low contrast imagery. 
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